建筑环境中许多物体的形状由他们与人体的关系决定:一个人将如何与这个对象进行互动? 3D形状的现有数据驱动的生成模型产生合理的物体,但不会理由对人体的那些物体的关系。在本文中,我们学习了3D形状的身体感知生成模型。具体而言,我们培养椅子的生成型号,一种无处不在的形状类别,可以在给定的身体形状或坐姿姿势调节。身体形状调节的型号生产椅子,为具有给定体形的人舒适;姿势调节模型生产适应坐姿的椅子。要训​​练这些模型,我们定义了“坐姿匹配”度量标准和小说“坐姿舒适”度量。计算这些指标需要昂贵的优化将身体置于椅子上,这太慢被用作用于训练生成模型的损耗功能。因此,我们训练神经网络以有效地近似这些度量。我们使用我们的方法培训三个身体感知生成形状模型:基于结构的零件的发电机,点云发生器和隐式表面发生器。在所有情况下,我们的方法都生产适应其输出椅形状以输入人体规格的型号。
translated by 谷歌翻译
Semantic segmentation usually benefits from global contexts, fine localisation information, multi-scale features, etc. To advance Transformer-based segmenters with these aspects, we present a simple yet powerful semantic segmentation architecture, termed as IncepFormer. IncepFormer has two critical contributions as following. First, it introduces a novel pyramid structured Transformer encoder which harvests global context and fine localisation features simultaneously. These features are concatenated and fed into a convolution layer for final per-pixel prediction. Second, IncepFormer integrates an Inception-like architecture with depth-wise convolutions, and a light-weight feed-forward module in each self-attention layer, efficiently obtaining rich local multi-scale object features. Extensive experiments on five benchmarks show that our IncepFormer is superior to state-of-the-art methods in both accuracy and speed, e.g., 1) our IncepFormer-S achieves 47.7% mIoU on ADE20K which outperforms the existing best method by 1% while only costs half parameters and fewer FLOPs. 2) Our IncepFormer-B finally achieves 82.0% mIoU on Cityscapes dataset with 39.6M parameters. Code is available:github.com/shendu0321/IncepFormer.
translated by 谷歌翻译
Modern speech enhancement (SE) networks typically implement noise suppression through time-frequency masking, latent representation masking, or discriminative signal prediction. In contrast, some recent works explore SE via generative speech synthesis, where the system's output is synthesized by a neural vocoder after an inherently lossy feature-denoising step. In this paper, we propose a denoising vocoder (DeVo) approach, where a vocoder accepts noisy representations and learns to directly synthesize clean speech. We leverage rich representations from self-supervised learning (SSL) speech models to discover relevant features. We conduct a candidate search across 15 potential SSL front-ends and subsequently train our vocoder adversarially with the best SSL configuration. Additionally, we demonstrate a causal version capable of running on streaming audio with 10ms latency and minimal performance degradation. Finally, we conduct both objective evaluations and subjective listening studies to show our system improves objective metrics and outperforms an existing state-of-the-art SE model subjectively.
translated by 谷歌翻译
当我们使用算法提出建议时,我们通常认为这些建议是提供有用的信息,例如在向法官或医生提供风险评估时。但是,当决策者获得建议时,他们不仅可以对信息做出反应。决策者可以将建议视为默认行动,使他们偏离偏差,例如,当法官不愿推翻对被告的高风险评估或医生担心偏离建议程序的后果时。在本文中,我们考虑建议不仅通过转移信念,而且通过改变偏好来影响选择的效果和设计。我们激励我们的模型从制度因素(例如避免审核的愿望)以及行为科学中建立的模型中的渴望,这些模型相对于参考点,这些模型预测了相对于参考点的损失厌恶,这是由算法设定的。我们表明,与建议有关的偏好造成了效率低下的效率,而决策者对建议过于响应,这改变了算法的最佳设计,以提供较不保守的建议。作为一种潜在的补救措施,我们讨论了一种算法,该算法从战略上扣留建议,并展示如何提高最终决策的质量。
translated by 谷歌翻译
全世界不可持续的捕鱼实践对海洋资源和生态系统构成了重大威胁。识别逃避监测系统的船只(称为“深色船只”)是管理和保护海洋环境健康的关键。随着基于卫星的合成孔径雷达(SAR)成像和现代机器学习(ML)的兴起,现在可以在全天候条件下白天或黑夜自动检测到黑暗的容器。但是,SAR图像需要特定于域的治疗,并且ML社区无法广泛使用。此外,对象(船只)是小而稀疏的,具有挑战性的传统计算机视觉方法。我们提出了用于训练ML模型的最大标记数据集,以检测和表征SAR的血管。 XView3-SAR由Sentinel-1任务中的近1,000张分析SAR图像组成,平均每个29,400 x-24,400像素。使用自动化和手动分析的组合对图像进行注释。每个SAR图像都伴随着共置的测深和风状射手。我们概述了XView3计算机视觉挑战的结果,这是一项国际竞争,使用XView3-SAR进行大规模的船舶检测和表征。我们发布数据(https://iuu.xview.us/)和代码(https://github.com/diux-xview),以支持该重要应用程序的ML方法的持续开发和评估。
translated by 谷歌翻译
提出了一种算法,用于构建与计算机断层扫描成像的两相材料构建高阶签名距离场。符号距离字段是高阶的,因为它没有与采样信号的距离变换相关联的量化伪像。使用最接近的点算法来解决窄带,该算法扩展到不是符号距离字段的隐式嵌入式。高阶快速扫描算法用于将窄带扩展到域的其余部分。在理想的隐式表面上验证了窄带和扩展方法的准确性顺序。该方法适用于10个精馏牛小梁骨的切除立方体。用这些受试者验证表面,相密度估计和局部形态学的定位。由于嵌入是高阶,梯度,因此可以在图像数据中本地局部地精确地估计曲线。
translated by 谷歌翻译